Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Enhancement for support PSU LED management #12

Closed
wants to merge 2 commits into from

Conversation

Junchao-Mellanox
Copy link
Owner

- Why I did it

PSU LED management require PSU temperature, PSU temperature threshold, PSU voltage min value and PSU voltage max value to determine PSU LED status.

- How I did it

Implement following API in PSU class for upper layer get information for LED management:

  1. get_temperature
  2. get_temperature_high_threshold
  3. get_voltage_high_threshold (not implemented, hw-mgmt doesn't provide related sysfs for now)
  4. get_voltage_low_threshold (not implemented, hw-mgmt doesn't provide related sysfs for now)

- How to verify it

  1. Manual test on Spectrum1, Spectrum2
  2. Add/adjust test cases in sonic-mgmt and run the case

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox
Copy link
Owner Author

@Junchao-Mellanox Junchao-Mellanox deleted the psu-led branch May 7, 2020 10:04
Junchao-Mellanox pushed a commit that referenced this pull request Jul 31, 2020
* src/sonic-telemetry fa8d498...3bd7ca3 (4):
  > Update gnmi deps (#40)
  > [testdata] Update SFP keys to align with new standard (#39)
  > Fixed the parameters for subscribe APIs (#38)
  > Azure ro mode (#34)

* src/sonic-mgmt-common 444aa9a...cc01ce4 (4):
  > Make gnmi dep version the same as in telemetry repo (#17)
  > Cleanup translib and cvl go test cases (#13)
  > Package update and enhancements/fixes in YGOT, and Request Binder (#12)
  > Translib phase I changes (#11)

Note: sonic-telemetry submodule update is dependent upon sonic-mgmt-common submodule update, thus updating both in this patch
Junchao-Mellanox pushed a commit that referenced this pull request Nov 12, 2021
Updated the hw-mgmt pointer to include some bugfixes related to power supply voltages.
Junchao-Mellanox pushed a commit that referenced this pull request Mar 13, 2024
…tically (sonic-net#17847)

#### Why I did it
src/sonic-dash-api
```
* 8f481de - (HEAD -> master, origin/master, origin/HEAD) [misc]: Add utils CLI (#12) (24 hours ago) [Ze Gan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Dec 11, 2024
#### Why I did it

To fix errors that happen when writing to the queue:

```
Jun  5 23:04:41.798613 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.798985 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.799535 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.806010 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.814075 r-leopard-56 ERR healthd: system_service[Errno 104] Connection reset by peer
Jun  5 23:04:41.824135 r-leopard-56 ERR healthd: Traceback (most recent call last):#12  File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 484, in system_service#012    msg = self.myQ.get(timeout=QUEUE_TIMEOUT)#12  File "<string>", line 2, in get#012  File "/usr/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod#012    kind, result = conn.recv()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 255, in recv#012    buf = self._recv_bytes()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes#012    buf = self._recv(4)#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 384, in _recv#012    chunk = read(handle, remaining)#012ConnectionResetError: [Errno 104] Connection reset by peer
Jun  5 23:04:41.826489 r-leopard-56 INFO healthd[8494]: ERROR:dbus.connection:Exception in handler for D-Bus signal:
Jun  5 23:04:41.826591 r-leopard-56 INFO healthd[8494]: Traceback (most recent call last):
Jun  5 23:04:41.826640 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3/dist-packages/dbus/connection.py", line 232, in maybe_handle_message
Jun  5 23:04:41.826686 r-leopard-56 INFO healthd[8494]:     self._handler(*args, **kwargs)
Jun  5 23:04:41.826738 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 82, in on_job_removed
Jun  5 23:04:41.826785 r-leopard-56 INFO healthd[8494]:     self.task_notify(msg)
Jun  5 23:04:41.826831 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 110, in task_notify
Jun  5 23:04:41.826877 r-leopard-56 INFO healthd[8494]:     self.task_queue.put(msg)
Jun  5 23:04:41.826923 r-leopard-56 INFO healthd[8494]:   File "<string>", line 2, in put
Jun  5 23:04:41.826973 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/managers.py", line 808, in _callmethod
Jun  5 23:04:41.827018 r-leopard-56 INFO healthd[8494]:     conn.send((self._id, methodname, args, kwds))
Jun  5 23:04:41.827065 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 211, in send
Jun  5 23:04:41.827115 r-leopard-56 INFO healthd[8494]:     self._send_bytes(_ForkingPickler.dumps(obj))
Jun  5 23:04:41.827158 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
Jun  5 23:04:41.827199 r-leopard-56 INFO healthd[8494]:     self._send(header + buf)
Jun  5 23:04:41.827254 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 373, in _send
Jun  5 23:04:41.827322 r-leopard-56 INFO healthd[8494]:     n = write(self._handle, buf)
Jun  5 23:04:41.827368 r-leopard-56 INFO healthd[8494]: BrokenPipeError: [Errno 32] Broken pipe
Jun  5 23:04:42.800216 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
```

When the multiprocessing.Manager is shutdown the queue will raise the above errors. This happens during shutdown - fast-reboot, warm-reboot.


With the fix, system-health service does not hang:

```
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:07:56 PM IDT 2024: Stopping...
Thu Oct 17 01:07:58 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:08:13 PM IDT 2024: Stopping...
Thu Oct 17 01:08:14 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:09:05 PM IDT 2024: Stopping...
Thu Oct 17 01:09:06 PM IDT 2024: Stopped
```

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Remove the call to shutdown, the cleanup will happen automatically when GC runs as per documentation - https://docs.python.org/3/library/multiprocessing.html

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

Run warm-reboot, fast-reboot multiple times and verify no errors in the log.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202311
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
Junchao-Mellanox pushed a commit that referenced this pull request Dec 11, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Junchao-Mellanox pushed a commit that referenced this pull request Dec 24, 2024
To fix a statistical issue. The original fix was done in FRRouting/frr#17297. However to accommodate 8.5.4 the patch in the PR was added.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants